`

Load the tweets and check if they are loaded correctly. We also check the summary for a first interpretation. The summary(tweets) output reveals the following:

# Set working directory
# getwd()
# setwd("./data/")

# Load data
load("../data/Tweets_all.rda")

# Check that tweets are loaded
head(tweets)
## # A tibble: 6 × 14
##   created_at               id id_str            full_text in_reply_to_screen_n…¹
##   <dttm>                <dbl> <chr>             <chr>     <chr>                 
## 1 2023-01-20 17:17:32 1.62e18 1616469988369469… "Im MSc … <NA>                  
## 2 2023-01-13 07:52:01 1.61e18 1613790954737074… "Was bew… <NA>                  
## 3 2023-01-12 19:30:01 1.61e18 1613604227141537… "Was uns… <NA>                  
## 4 2023-01-12 08:23:00 1.61e18 1613436367169634… "Eine di… <NA>                  
## 5 2023-01-11 14:00:05 1.61e18 1613158809081450… "Wir gra… <NA>                  
## 6 2023-01-10 17:06:11 1.61e18 1612843252083834… "Unsere … <NA>                  
## # ℹ abbreviated name: ¹​in_reply_to_screen_name
## # ℹ 9 more variables: retweet_count <int>, favorite_count <int>, lang <chr>,
## #   university <chr>, tweet_date <dttm>, tweet_minute <dttm>,
## #   tweet_hour <dttm>, tweet_month <date>, timeofday_hour <chr>
summary(tweets)
##    created_at                          id               id_str         
##  Min.   :2009-09-29 14:29:47.0   Min.   :4.469e+09   Length:19575      
##  1st Qu.:2015-01-28 15:07:41.5   1st Qu.:5.604e+17   Class :character  
##  Median :2018-04-13 13:26:56.0   Median :9.848e+17   Mode  :character  
##  Mean   :2017-12-09 15:26:50.7   Mean   :9.400e+17                     
##  3rd Qu.:2020-10-20 10:34:50.0   3rd Qu.:1.318e+18                     
##  Max.   :2023-01-26 14:49:31.0   Max.   :1.619e+18                     
##   full_text         in_reply_to_screen_name retweet_count     favorite_count  
##  Length:19575       Length:19575            Min.   :  0.000   Min.   :  0.00  
##  Class :character   Class :character        1st Qu.:  0.000   1st Qu.:  0.00  
##  Mode  :character   Mode  :character        Median :  1.000   Median :  0.00  
##                                             Mean   :  1.289   Mean   :  1.37  
##                                             3rd Qu.:  2.000   3rd Qu.:  2.00  
##                                             Max.   :267.000   Max.   :188.00  
##      lang            university          tweet_date                    
##  Length:19575       Length:19575       Min.   :2009-09-29 00:00:00.00  
##  Class :character   Class :character   1st Qu.:2015-01-28 00:00:00.00  
##  Mode  :character   Mode  :character   Median :2018-04-13 00:00:00.00  
##                                        Mean   :2017-12-09 02:25:45.00  
##                                        3rd Qu.:2020-10-20 00:00:00.00  
##                                        Max.   :2023-01-26 00:00:00.00  
##   tweet_minute                      tweet_hour                    
##  Min.   :2009-09-29 14:29:00.00   Min.   :2009-09-29 14:00:00.00  
##  1st Qu.:2015-01-28 15:07:00.00   1st Qu.:2015-01-28 14:30:00.00  
##  Median :2018-04-13 13:26:00.00   Median :2018-04-13 13:00:00.00  
##  Mean   :2017-12-09 15:26:24.68   Mean   :2017-12-09 14:59:43.81  
##  3rd Qu.:2020-10-20 10:34:30.00   3rd Qu.:2020-10-20 10:00:00.00  
##  Max.   :2023-01-26 14:49:00.00   Max.   :2023-01-26 14:00:00.00  
##   tweet_month         timeofday_hour    
##  Min.   :2009-09-01   Length:19575      
##  1st Qu.:2015-01-01   Class :character  
##  Median :2018-04-01   Mode  :character  
##  Mean   :2017-11-24                     
##  3rd Qu.:2020-10-01                     
##  Max.   :2023-01-01

Start preprocessing the tweets, to calculate the intervalls some additional properties are needed. The preprocessing steps transform raw tweet data into a structured format suitable for analysis. This includes:

# Preprocessing Step: Convert date and time to POSIXct and format according to date, year and university. Detect language and extract emojis. The days are sorted from the system locale starting from monday
tweets <- tweets %>%
  mutate(
    created_at = as.POSIXct(created_at, format = "%Y-%m-%d %H:%M:%S"),
    date = as.Date(created_at),
    day = lubridate::wday(created_at,
      label = TRUE, abbr = FALSE,
      week_start = getOption("lubridate.week.start", 1),
      locale = Sys.getlocale("LC_TIME")
    ),
    year = year(created_at),
    month = floor_date(created_at, "month"),
    university = as.character(university),
    lang = detect_language(full_text),
    full_text_emojis = replace_emoji(full_text, emoji_dt = lexicon::hash_emojis)
  )

# Remove Emoji Tags helper funciton
# replace emoji places the emojis in the text as tags and their name, we remove them here
remove_emoji_tags <- function(text) {
  str_remove_all(text, "<[a-z0-9]{2}>")
}
# Remove Emoji Tags
tweets$full_text_emojis <- sapply(tweets$full_text_emojis, remove_emoji_tags)

# Store emojis in a sep arate column to analyze later
tweets$emoji_unicode <- tweets %>%
  emoji_extract_nest(full_text) %>%
  select(.emoji_unicode)

Question 1: How many tweets are being posted by the various Universities when? Are there any ‘release’ strategies visible?

Most Active Hours:

Each university has a distinct peak hour for tweeting, often aligning with typical working hours (9 AM - 5 PM). This suggests a strategic approach to reach their target audience when they are most likely online. The most active hours for each university are as follows:

  • FHNW: 9 AM
  • FH Graubünden: 11 AM
  • ZHAW: 5 PM
  • BFH: 8 AM
  • HES-SO: 10 AM
  • HSLU: 9 AM
  • OST-FH: 8 AM
  • SUPSI-CH: 11 AM

These times typically align with standard working hours, indicating a strategic approach to reach their audience during times they are most likely to be online. It appears that a typical worker is more productive and active on Twitter in the morning, with motivation waning around midday and continuing to decline until the end of the workday.

Most Active Days:

There isn’t a consistent “most active day” across universities. Some favor weekdays, while others show higher activity on weekends. This could reflect differences in their target audience or the nature of their content.

  • FHNW: Tuesday
  • FH Graubünden: Tuesday
  • ZHAW: Wednesday
  • BFH: Tuesday
  • HES-SO: Tuesday
  • HSLU: Thursday
  • OST-FH: Friday
  • SUPSI-CH: Friday

The pattern also suggests that tweet activity tends to be higher earlier in the week, with motivation and tweet frequency potentially falling as the week progresses.

Release Strategies:

strategy and perhaps a more reactive approach to current events or trends. While universities have peak hours and days, the intervals between tweets vary significantly, indicating a more reactive strategy rather than a rigid release schedule. This variability suggests that the universities might be responding to real-time events or trends rather than sticking to a strict posting schedule.

# Count each tweet by university and hour of the day
tweet_counts_by_hour_of_day <- tweets %>%
  group_by(university, timeofday_hour) %>%
  count() %>%
  arrange(university, timeofday_hour)

# Plot the number of tweets by university and hour of the day
ggplot(
  tweet_counts_by_hour_of_day,
  aes(
    x = timeofday_hour, y = n,
    color = university, group = university
  )
) +
  geom_line() +
  facet_wrap(~university) +
  labs(
    title = "Number of tweets by university and hour",
    x = "Hour of day",
    y = "Number of tweets"
  )

# Show most active hours for each university
hours_with_most_tweets_by_uni <- tweet_counts_by_hour_of_day %>%
  group_by(university, timeofday_hour) %>%
  summarize(total_tweets = sum(n)) %>%
  group_by(university) %>%
  slice_max(n = 1, order_by = total_tweets)

print(hours_with_most_tweets_by_uni)
## # A tibble: 8 × 3
## # Groups:   university [8]
##   university     timeofday_hour total_tweets
##   <chr>          <chr>                 <int>
## 1 FHNW           09                      344
## 2 FH_Graubuenden 11                      493
## 3 ZHAW           17                      580
## 4 bfh            08                      497
## 5 hes_so         10                      315
## 6 hslu           09                      380
## 7 ost_fh         08                       44
## 8 supsi_ch       11                      330
# Show most active hour overall
hour_with_most_tweets <- tweet_counts_by_hour_of_day %>%
  group_by(timeofday_hour) %>%
  summarize(total_tweets = sum(n)) %>%
  arrange(desc(total_tweets)) %>%
  slice_max(n = 1, order_by = total_tweets)

print(hour_with_most_tweets)
## # A tibble: 1 × 2
##   timeofday_hour total_tweets
##   <chr>                 <int>
## 1 11                     2356
# Count each tweet by university and weekday
tweet_counts_by_week_day <- tweets %>%
  group_by(university, day) %>%
  count() %>%
  arrange(university, day)

# Plot the number of tweets by university and day of the week
ggplot(
  tweet_counts_by_week_day,
  aes(
    x = day, y = n,
    color = university,
    group = university
  )
) +
  geom_line() +
  facet_wrap(~university) +
  labs(
    title = "Number of tweets by university and day of the week",
    x = "Day of the week",
    y = "Number of tweets"
  )

# Show most active days for each university
days_with_most_tweets_by_uni <- tweet_counts_by_week_day %>%
  group_by(university, day) %>%
  summarize(total_tweets = sum(n)) %>%
  group_by(university) %>%
  slice_max(n = 1, order_by = total_tweets)

print(days_with_most_tweets_by_uni)
## # A tibble: 8 × 3
## # Groups:   university [8]
##   university     day       total_tweets
##   <chr>          <ord>            <int>
## 1 FHNW           Tuesday            575
## 2 FH_Graubuenden Tuesday            751
## 3 ZHAW           Wednesday          636
## 4 bfh            Tuesday            651
## 5 hes_so         Tuesday            415
## 6 hslu           Thursday           603
## 7 ost_fh         Friday              65
## 8 supsi_ch       Friday             461
# Calculate time intervals between tweets
find_mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

tweets <- tweets %>%
  arrange(university, created_at) %>%
  group_by(university) %>%
  mutate(time_interval = as.numeric(
    difftime(created_at, lag(created_at), units = "mins")
  ))

# Descriptive statistics of time intervals
summary(tweets$time_interval)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##      0.0    148.2   1128.8   2097.6   2428.3 220707.0        8
# setwd("../4.Text-Mining-Groupwork/plots")
unique_years <- tweets$year %>% unique()
# Pilot distribution of time intervals between tweets for each year
for (curr_year in unique_years) {
  # Filter data for the specific year
  filtered_data <- tweets %>%
    filter(year(created_at) == curr_year)

  print(ggplot(filtered_data, aes(x = time_interval)) +
    geom_histogram(fill = "lightblue") +
    facet_wrap(~university) +
    labs(
      title = paste0(
        "Distribution of time intervals between tweets - ", curr_year
      ),
      x = "Time interval (minutes)",
      y = "Tweet count"
    ))
  universities <- filtered_data$university %>% unique()
  for (uni in universities) {
    # Filter data for the specific university
    uni_filtered_data <- filtered_data %>%
      filter(university == uni)

    print(ggplot(uni_filtered_data, aes(x = time_interval)) +
      geom_histogram(fill = "lightblue") +
      labs(
        title = paste0(
          "Distribution of time intervals between tweets for ", uni,
          " in ", curr_year
        ),
        x = "Time interval (minutes)",
        y = "Tweet count"
      ))
    # Calculate mode (most common interval) in hours
    most_common_interval_minutes <- find_mode(uni_filtered_data$time_interval)
    most_common_interval_hours <- most_common_interval_minutes / 60
    print(paste0(
      "Most common time interval for ", uni,
      " in ",
      curr_year,
      " is ", most_common_interval_minutes,
      " minutes (", most_common_interval_hours, " hours)"
    ))
  }
}

## [1] "Most common time interval for FHNW in 2011 is NA minutes (NA hours)"

## [1] "Most common time interval for FH_Graubuenden in 2011 is 23210.3 minutes (386.838333333333 hours)"

## [1] "Most common time interval for hes_so in 2011 is 1.55 minutes (0.0258333333333333 hours)"

## [1] "Most common time interval for FHNW in 2012 is 17324.65 minutes (288.744166666667 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2012 is 0.9 minutes (0.015 hours)"

## [1] "Most common time interval for ZHAW in 2012 is NA minutes (NA hours)"

## [1] "Most common time interval for bfh in 2012 is NA minutes (NA hours)"

## [1] "Most common time interval for hes_so in 2012 is 22086.35 minutes (368.105833333333 hours)"

## [1] "Most common time interval for FHNW in 2013 is 1.26666666666667 minutes (0.0211111111111111 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2013 is 21879.45 minutes (364.6575 hours)"

## [1] "Most common time interval for ZHAW in 2013 is 0.583333333333333 minutes (0.00972222222222222 hours)"

## [1] "Most common time interval for bfh in 2013 is 65.0833333333333 minutes (1.08472222222222 hours)"

## [1] "Most common time interval for hes_so in 2013 is 36252.5833333333 minutes (604.209722222222 hours)"

## [1] "Most common time interval for supsi_ch in 2013 is 0.783333333333333 minutes (0.0130555555555556 hours)"

## [1] "Most common time interval for FHNW in 2014 is 4.58333333333333 minutes (0.0763888888888889 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2014 is 0.183333333333333 minutes (0.00305555555555556 hours)"

## [1] "Most common time interval for ZHAW in 2014 is 0.05 minutes (0.000833333333333333 hours)"

## [1] "Most common time interval for bfh in 2014 is 153.35 minutes (2.55583333333333 hours)"

## [1] "Most common time interval for hes_so in 2014 is 21986.6 minutes (366.443333333333 hours)"

## [1] "Most common time interval for supsi_ch in 2014 is 37496.4833333333 minutes (624.941388888889 hours)"

## [1] "Most common time interval for FHNW in 2015 is 48918.3 minutes (815.305 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2015 is 1139.9 minutes (18.9983333333333 hours)"

## [1] "Most common time interval for ZHAW in 2015 is 0.316666666666667 minutes (0.00527777777777778 hours)"

## [1] "Most common time interval for bfh in 2015 is 20272.0333333333 minutes (337.867222222222 hours)"

## [1] "Most common time interval for hes_so in 2015 is 0.166666666666667 minutes (0.00277777777777778 hours)"

## [1] "Most common time interval for supsi_ch in 2015 is 43496.6333333333 minutes (724.943888888889 hours)"

## [1] "Most common time interval for FHNW in 2016 is 34708.6666666667 minutes (578.477777777778 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2016 is 240.05 minutes (4.00083333333333 hours)"

## [1] "Most common time interval for ZHAW in 2016 is 21.2 minutes (0.353333333333333 hours)"

## [1] "Most common time interval for bfh in 2016 is 0.0833333333333333 minutes (0.00138888888888889 hours)"

## [1] "Most common time interval for hes_so in 2016 is 2.7 minutes (0.045 hours)"

## [1] "Most common time interval for hslu in 2016 is NA minutes (NA hours)"

## [1] "Most common time interval for supsi_ch in 2016 is 1.58333333333333 minutes (0.0263888888888889 hours)"

## [1] "Most common time interval for FHNW in 2017 is 48748.5333333333 minutes (812.475555555556 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2017 is 5617.83333333333 minutes (93.6305555555555 hours)"

## [1] "Most common time interval for ZHAW in 2017 is 6954.43333333333 minutes (115.907222222222 hours)"

## [1] "Most common time interval for bfh in 2017 is 18606.6666666667 minutes (310.111111111111 hours)"

## [1] "Most common time interval for hes_so in 2017 is 71909.9833333333 minutes (1198.49972222222 hours)"

## [1] "Most common time interval for hslu in 2017 is 0.266666666666667 minutes (0.00444444444444444 hours)"

## [1] "Most common time interval for supsi_ch in 2017 is 1.36666666666667 minutes (0.0227777777777778 hours)"

## [1] "Most common time interval for FHNW in 2018 is 0.166666666666667 minutes (0.00277777777777778 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2018 is 1446.23333333333 minutes (24.1038888888889 hours)"

## [1] "Most common time interval for ZHAW in 2018 is 5689.93333333333 minutes (94.8322222222222 hours)"

## [1] "Most common time interval for bfh in 2018 is 20172.05 minutes (336.200833333333 hours)"

## [1] "Most common time interval for hes_so in 2018 is 31170.8333333333 minutes (519.513888888889 hours)"

## [1] "Most common time interval for hslu in 2018 is 0.233333333333333 minutes (0.00388888888888889 hours)"

## [1] "Most common time interval for supsi_ch in 2018 is 0.183333333333333 minutes (0.00305555555555556 hours)"

## [1] "Most common time interval for FHNW in 2019 is 315.233333333333 minutes (5.25388888888889 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2019 is 10079.85 minutes (167.9975 hours)"

## [1] "Most common time interval for ZHAW in 2019 is 1255.61666666667 minutes (20.9269444444444 hours)"

## [1] "Most common time interval for bfh in 2019 is 1440.05 minutes (24.0008333333333 hours)"

## [1] "Most common time interval for hes_so in 2019 is 1140.03333333333 minutes (19.0005555555556 hours)"

## [1] "Most common time interval for hslu in 2019 is 1.95 minutes (0.0325 hours)"

## [1] "Most common time interval for supsi_ch in 2019 is 15 minutes (0.25 hours)"

## [1] "Most common time interval for FHNW in 2020 is 3180.16666666667 minutes (53.0027777777778 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2020 is 2880.03333333333 minutes (48.0005555555556 hours)"

## [1] "Most common time interval for ZHAW in 2020 is 13693.7666666667 minutes (228.229444444444 hours)"

## [1] "Most common time interval for bfh in 2020 is 14531.7333333333 minutes (242.195555555556 hours)"

## [1] "Most common time interval for hes_so in 2020 is 1139.91666666667 minutes (18.9986111111111 hours)"

## [1] "Most common time interval for hslu in 2020 is 120 minutes (2 hours)"

## [1] "Most common time interval for ost_fh in 2020 is NA minutes (NA hours)"

## [1] "Most common time interval for supsi_ch in 2020 is 0.133333333333333 minutes (0.00222222222222222 hours)"

## [1] "Most common time interval for FHNW in 2021 is 0.5 minutes (0.00833333333333333 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2021 is 0.333333333333333 minutes (0.00555555555555555 hours)"

## [1] "Most common time interval for ZHAW in 2021 is 13043.9833333333 minutes (217.399722222222 hours)"

## [1] "Most common time interval for bfh in 2021 is 1411.05 minutes (23.5175 hours)"

## [1] "Most common time interval for hes_so in 2021 is 0 minutes (0 hours)"

## [1] "Most common time interval for hslu in 2021 is 0 minutes (0 hours)"

## [1] "Most common time interval for ost_fh in 2021 is 0.35 minutes (0.00583333333333333 hours)"

## [1] "Most common time interval for supsi_ch in 2021 is 1140 minutes (19 hours)"

## [1] "Most common time interval for FHNW in 2022 is 1439.93333333333 minutes (23.9988888888889 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2022 is 0.1 minutes (0.00166666666666667 hours)"

## [1] "Most common time interval for ZHAW in 2022 is 18623.7166666667 minutes (310.395277777778 hours)"

## [1] "Most common time interval for bfh in 2022 is 7192.96666666667 minutes (119.882777777778 hours)"

## [1] "Most common time interval for hes_so in 2022 is 5798.53333333333 minutes (96.6422222222222 hours)"

## [1] "Most common time interval for hslu in 2022 is 0 minutes (0 hours)"

## [1] "Most common time interval for ost_fh in 2022 is 0.133333333333333 minutes (0.00222222222222222 hours)"

## [1] "Most common time interval for supsi_ch in 2022 is 28800.7333333333 minutes (480.012222222222 hours)"

## [1] "Most common time interval for FHNW in 2023 is 9997.63333333333 minutes (166.627222222222 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2023 is 21962.3833333333 minutes (366.039722222222 hours)"

## [1] "Most common time interval for ZHAW in 2023 is 70740.3333333333 minutes (1179.00555555556 hours)"

## [1] "Most common time interval for bfh in 2023 is 8000.11666666667 minutes (133.335277777778 hours)"

## [1] "Most common time interval for hes_so in 2023 is 4621.1 minutes (77.0183333333333 hours)"

## [1] "Most common time interval for hslu in 2023 is 627.083333333333 minutes (10.4513888888889 hours)"

## [1] "Most common time interval for supsi_ch in 2023 is 7199 minutes (119.983333333333 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2009 is NA minutes (NA hours)"

## [1] "Most common time interval for FH_Graubuenden in 2010 is 55732.2833333333 minutes (928.871388888889 hours)"

## [1] "Most common time interval for hes_so in 2010 is NA minutes (NA hours)"

Question 2: What are the tweets about and how do other Twitter users react to them (likes, etc.)?

Data Preprocessing

langs <- c("de", "fr", "it", "en")
tweets_filtered <- tweets %>%
  filter(lang %in% langs)
# Define extended stopwords (outside loop for efficiency)
# Remove 'amp' as it is not meaningful because its only & symbol
# Remove 'rt' because it is an word e.g 'engagiert'.
extended_stopwords <- c(
  "#fhnw", "#bfh", "@htw_chur", "#hslu", "#supsi", "#sups",
  "amp", "rt", "fr", "ber", "t.co", "https", "http", "www", "com", "html"
)
# Create separate DFMs for each language
dfm_list <- list()
for (sel_lang in langs) {
  # Subset tweets for the current language
  tweets_lang <- tweets_filtered %>%
    filter(lang == sel_lang)
  # Create tokens for the current language
  stopwords_lang <- stopwords(sel_lang)
  # Create tokens for all tweets:
  # - create corpus and tokens because tokensonly works on character, corpus, list, tokens, tokens_xptr objects.
  # - create tokens and remove: URLS, Punctuation, Numbers, Symbols, Separators
  # - transform to lowercase
  # - Stem all words
  # - Create n-grams of any length (not includinf bigrams and trigrams but they are shown later)
  # - It is important to remove the stopwords after stemming the words because we remove the endings from some stem words
  tokens_lang <- tweets_lang %>%
    corpus(text_field = "full_text_emojis") %>%
    tokens(
      remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
      remove_url = TRUE, remove_separators = TRUE
    ) %>%
    tokens_tolower() %>%
    tokens_wordstem(lang = sel_lang) %>%
    tokens_ngrams(n = 1) %>%
    tokens_select(
      pattern =
        c(stopwords_lang, extended_stopwords), selection = "remove"
    )
  # Create DFM for the current language
  dfm_list[[sel_lang]] <- dfm(tokens_lang)
}

Content Analysis

Tweets were analyzed across four languages: German, French, Italian, and English. Each university tends to tweet predominantly in one or more languages, reflecting the linguistic diversity of Switzerland.

  • German: Predominantly used by BFH and FHNW. Common words include “neu” (new), “mehr” (more), “schweiz” (Switzerland), and “studier” (study).
  • French: Primarily used by HES-SO. Common words include “projet” (project), “recherch” (research), and “tudi” (study).
  • Italian: Mostly used by SUPSI. Common words include “nuov” (new), “progett” (project), and “student” (student).
  • English: Frequently used by HSLU. Common words include “student”, “project”, “thank”, and “university”. ### Word Frequency:
  • English: The most frequent words were “student”, “new”, “@hslu”, “university”, “project”, “thank”, “@zhaw”, “day”, “science”, and “today”.
  • German: The most frequent words were “neu”, “mehr”, “schweiz”, “werd”, “all”, “studier”, “heut”, “hochschul”, “bfh”, and “jahr”.
  • Italian: The most frequent words were “nuov”, “sups”, “progett”, “student”, “present”, “info”, “iscrizion”, “cors”, “ricerc”, and “formazion”.
  • French: The most frequent words were “hes-so”, “right”, “arrow”, “dan”, “projet”, “a”, “tudi”, “haut”, “col”, and “@hes_so”.

It’s important to note that some words like “right” 👉 and “arrow” ➡️ are actually names of parsed emojis and not written words in the tweets.

Word clouds for each language visually depicted the most common words, emphasizing their relative frequencies. The analysis revealed that universities tweet in multiple languages, reflecting the linguistic diversity of their audience. The most common words often related to educational themes, projects, and institutional news, indicating a focus on academic content.

# Word Frequencies & Visualization
words_freqs_en <- sort(colSums(dfm_list$en), decreasing = TRUE)
head(words_freqs_en, 20)
##     student         new       @hslu     univers     project       thank 
##         106          74          70          62          60          60 
##       @zhaw         day      scienc       today       innov         now 
##          59          56          54          52          51          50 
##       swiss switzerland       @fhnw       great          us        join 
##          49          49          46          46          44          43 
##       studi    research 
##          42          42
wordcloud2(data.frame(
  word = names(words_freqs_en),
  freq = words_freqs_en
), size = 0.5)
words_freqs_de <- sort(colSums(dfm_list$de), decreasing = TRUE)
head(words_freqs_de, 20)
##       neu      mehr   schweiz      werd       all   studier      heut hochschul 
##      1586      1104       967       772       706       706       638       601 
##       bfh      jahr     knnen   digital     thema     studi   projekt     welch 
##       577       535       507       499       497       466       465       462 
##      bern     statt     zeigt    arbeit 
##       454       451       437       434
wordcloud2(data.frame(
  word = names(words_freqs_de),
  freq = words_freqs_de
), size = 0.5)
word_freqs_it <- sort(colSums(dfm_list$it), decreasing = TRUE)
head(word_freqs_it, 20)
##        nuov        sups     progett     student     present        info 
##         210         208         173         146         143         143 
##   iscrizion        cors      ricerc   formazion  #supsinews #supsievent 
##         142         141         135         134         134         129 
##       scopr      inform      diplom    bachelor       apert        tutt 
##         123         120         116         111         110         105 
##      master          pi 
##         103         102
wordcloud2(data.frame(
  word = names(word_freqs_it),
  freq = word_freqs_it
), size = 0.5)
# It seems that there are some english words but I think this are emojis
word_freqs_fr <- sort(colSums(dfm_list$fr), decreasing = TRUE)
head(word_freqs_fr, 20)
##    hes-so     right     arrow       dan    projet         a      tudi      haut 
##       505       432       324       249       248       234       199       183 
##       col   @hes_so @hessoval    dcouvr      book      open  recherch   #hes_so 
##       155       140       129       127       123       118       117       115 
##     suiss      plus      mast   nouveau 
##       110       105       103        98
wordcloud2(data.frame(
  word = names(word_freqs_fr),
  freq = word_freqs_fr
), size = 0.5)
# University-specific Analysis
for (uni in unique(tweets$university)) {
  # Subset tweets for the current language
  uni_tweets <- tweets_filtered %>%
    filter(university == uni)

  tokens_lang <- uni_tweets %>%
    corpus(text_field = "full_text_emojis") %>%
    tokens(
      remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
      remove_url = TRUE, remove_separators = TRUE
    ) %>%
    tokens_tolower() %>%
    tokens_wordstem() %>%
    tokens_ngrams(n = 1) %>%
    tokens_select(
      pattern =
        c(
          stopwords("en"), stopwords("de"),
          stopwords("fr"), stopwords("it"), extended_stopwords
        ), selection = "remove"
    )
  # Create Data Frame Matrix for uni with all languages
  uni_dfm <- dfm(tokens_lang)
  # Word Frequencies
  uni_word_freqs <- sort(colSums(uni_dfm), decreasing = TRUE)
  # print most common words: the emoji right are used often
  head(uni_word_freqs, 20)
  wordcloud2(data.frame(
    word = names(uni_word_freqs),
    freq = uni_word_freqs
  ), size = 0.5)
}

Userreaction Analysis

A weighted engagement metric was calculated to measure user reactions, considering both likes (favorites) and retweets, with retweets given double weight.

Posting Times of Most Engaged Tweets: The analysis of the posting times of the most engaged tweets (top 1000 by engagement) showed that:

  • The most engaged tweets were posted throughout the day, with a notable peak around mid-morning (11 AM).
  • This pattern aligns with the overall finding that users tend to be more active and engaged during typical working hours. Which can be a sign, that the users are more active during the day and not during the night. Or the customers of the universities are working also in a university or are students.

Content of Most Engaged Tweets:

The most common words in the most engaged tweets included “mehr” (more), “neue” (new), “schweiz” (Switzerland), “schweizer” (Swiss), “right”, “heut” (today), “zeigt” (shows), “#hsluinformatik” (HSLU informatics), “studi” (study), and “zhaw”. Again, “right” and similar terms are names of emojis and not actual

# Calculate a 'weighted engagement' metric
tweets <- tweets %>%
  mutate(
    weighted_engagement = favorite_count * 1 + retweet_count * 2
  )

# Identify tweets with the highest weighted engagement
most_engaged_tweets <- tweets %>%
  arrange(desc(weighted_engagement)) %>%
  head(1000) # Top 1000 for analysis

# Analyze posting time of most engaged tweets (same as before)
most_engaged_tweets_time <- most_engaged_tweets %>%
  mutate(time_of_day = format(created_at, "%H"))

ggplot(most_engaged_tweets_time, aes(x = as.numeric(time_of_day))) +
  geom_histogram(binwidth = 1, fill = "lightblue", color = "blue") +
  labs(
    title = "Distribution of Posting Times for Most Engaged Tweets",
    x = "Hour of Day",
    y = "Frequency"
  )

Analyse the content of the most liked tweets

# Preprocessing content of most liked tweets
tokens_most_engaged <- most_engaged_tweets %>%
  corpus(text_field = "full_text_emojis") %>%
  tokens(
    remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
    remove_url = TRUE, remove_separators = TRUE
  ) %>%
  tokens_tolower() %>%
  tokens_wordstem(lang = sel_lang) %>%
  tokens_ngrams(n = 1) %>%
  tokens_select(
    pattern =
      c(
        stopwords("en"), stopwords("de"),
        stopwords("fr"), stopwords("it"), extended_stopwords
      ), selection = "remove"
  )
tokens_most_engaged_dfm <- dfm(tokens_most_engaged)
freqs_most_engaged <- sort(colSums(tokens_most_engaged_dfm), decreasing = TRUE)
# print most common words: the emoji right are used often
head(freqs_most_engaged, 20)
##            mehr            neue         schweiz       schweizer           right 
##              81              67              48              47              46 
##            heut           zeigt #hsluinformatik           studi            zhaw 
##              44              41              40              39              39 
##          hes-so           knnen           neuen       hochschul          campus 
##              38              38              36              34              33 
##           innov            gibt              ab      entwickelt             bfh 
##              31              30              30              30              30
set.seed(123)
wordcloud2(data.frame(
  word = names(freqs_most_engaged),
  freq = freqs_most_engaged
), size = 0.5)

The analysis indicates that Swiss Universities of Applied Sciences tweet in multiple languages, reflecting the linguistic diversity of their audience. The tweets often focus on educational themes, projects, and institutional news. User engagement is highest for tweets posted during working hours, with the most engaging content often including timely updates and relevant academic information. Recognizing the role of emojis in enhancing engagement, universities can further optimize their social media strategies to maximize reach and impact. ## Question 3: How do the university tweets differ in terms of content, style, emotions, etc?

Content Analysis (Word Clouds)

Each university shows distinct patterns in the words and emojis used in their tweets. The analysis involved creating word clouds and identifying the most common words and emojis.

Most Common Words:

  • FHNW: Common words include “mehr” (more), “hochschul” (university), and “studierend” (students).
  • FH Graubünden: Words like “chur” (location), “htw” (university abbreviation), and “busi” (business) are frequent.
  • ZHAW: Frequent words include “zhaw” (university abbreviation), “engineering”, and “schweizer” (Swiss).
  • BFH: Common terms are “bern” (location), “projekt” (project), and “hochschul” (university).
  • HES-SO: Words like “hes-so” (university abbreviation), “projet” (project), and “tudiant” (student) are prevalent.
  • HSLU: Common words are “hslu” (university abbreviation), “luzern” (location), and “hochschul” (university).
  • OST-FH: Frequent terms include “ost” (university abbreviation), “st.gallen” (location), and “podcast”.
  • SUPSI-CH: Words like “supsi” (university abbreviation), “formazion” (education), and “progetto” (project) are prevalent.

Most Common Emojis: - FHNW: Top emojis include 👉 (backhand index pointing right), 💛 (yellow heart), and 🖤 (black heart). - FH Graubünden: Frequent emojis are 🎉 (party popper), 😃 (grinning face with big eyes), and 😊 (blush). - ZHAW: Common emojis include 👉 (backhand index pointing right), ⚡ (high voltage), and 😉 (wink). - BFH: Top emojis are 👉 (backhand index pointing right), 🔋 (battery), and 👇 (backhand index pointing down). - HES-SO: Common emojis are 👉 (backhand index pointing right), 🎓 (graduation cap), and ➡ (arrow right). - HSLU: Top emojis include 🎓 (graduation cap), 👨 (man), and 🚀 (rocket). - OST-FH: Frequent emojis are 👉 (backhand index pointing right), ➡ (arrow right), and 🎓 (graduation cap). - SUPSI-CH: Common emojis include 👉 (backhand index pointing right), 🎓 (graduation cap), and 🎉 (party popper).

for (uni in unique(tweets$university)) {
  uni_tweets <- tweets %>%
    filter(university == uni, lang %in% langs)
  tokens_uni <- uni_tweets %>%
    corpus(text_field = "full_text_emojis") %>%
    tokens(
      remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
      remove_url = TRUE, remove_separators = TRUE
    ) %>%
    tokens_tolower() %>%
    tokens_wordstem() %>%
    tokens_ngrams(n = 1) %>%
    tokens_select(
      pattern =
        c(
          stopwords("en"), stopwords("de"),
          stopwords("fr"), stopwords("it"), extended_stopwords
        ), selection = "remove"
    )
  uni_dfm <- dfm(tokens_uni)
  freqs_uni <- sort(colSums(uni_dfm), decreasing = TRUE)
  # print most common words: the emoji right are used often
  head(freqs_uni, 20)
  set.seed(123)
  wordcloud2(data.frame(
    word = names(freqs_uni),
    freq = freqs_uni
  ), size = 0.5)

  # Analyze Top Emojis by University
  emoji_count_per_university <- uni_tweets %>%
    top_n_emojis(full_text)

  print(emoji_count_per_university)

  emoji_count_per_university %>%
    mutate(emoji_name = reorder(emoji_name, n)) %>%
    ggplot(aes(n, emoji_name)) +
    geom_col() +
    labs(x = "Count", y = NULL, title = "Top 20 Emojis Used")
}
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category        n
##    <chr>                         <chr>   <chr>             <int>
##  1 backhand_index_pointing_right 👉      People & Body        56
##  2 yellow_heart                  💛      Smileys & Emotion    34
##  3 black_heart                   🖤      Smileys & Emotion    32
##  4 woman                         👩      People & Body        28
##  5 man                           👨      People & Body        17
##  6 clap                          👏      People & Body        16
##  7 flag_Switzerland              🇨🇭      Flags                15
##  8 microscope                    🔬      Objects              15
##  9 computer                      💻      Objects              14
## 10 graduation_cap                🎓      Objects              13
## 11 school                        🏫      Travel & Places      13
## 12 face_with_medical_mask        😷      Smileys & Emotion    12
## 13 raised_hands                  🙌      People & Body        12
## 14 robot                         🤖      Smileys & Emotion    12
## 15 female_sign                   ♀️       Symbols              10
## 16 trophy                        🏆      Activities            9
## 17 woman_scientist               👩‍🔬      People & Body         9
## 18 party_popper                  🎉      Activities            8
## 19 star_struck                   🤩      Smileys & Emotion     8
## 20 sun_with_face                 🌞      Travel & Places       8
## # A tibble: 20 × 4
##    emoji_name                      unicode emoji_category        n
##    <chr>                           <chr>   <chr>             <int>
##  1 party_popper                    🎉      Activities           18
##  2 grinning_face_with_big_eyes     😃      Smileys & Emotion    15
##  3 blush                           😊      Smileys & Emotion     8
##  4 smiling_face_with_sunglasses    😎      Smileys & Emotion     8
##  5 bulb                            💡      Objects               7
##  6 +1                              👍      People & Body         6
##  7 camera_flash                    📸      Objects               6
##  8 flexed_biceps                   💪      People & Body         6
##  9 four_leaf_clover                🍀      Animals & Nature      6
## 10 grinning_face_with_smiling_eyes 😄      Smileys & Emotion     6
## 11 heart_eyes                      😍      Smileys & Emotion     6
## 12 hugs                            🤗      Smileys & Emotion     6
## 13 female_sign                     ♀️       Symbols               4
## 14 graduation_cap                  🎓      Objects               4
## 15 grinning                        😀      Smileys & Emotion     4
## 16 robot                           🤖      Smileys & Emotion     4
## 17 backhand_index_pointing_down    👇      People & Body         3
## 18 computer                        💻      Objects               3
## 19 lady_beetle                     🐞      Animals & Nature      3
## 20 ocean                           🌊      Travel & Places       3
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category        n
##    <chr>                         <chr>   <chr>             <int>
##  1 backhand_index_pointing_right 👉      People & Body        21
##  2 high_voltage                  ⚡      Travel & Places      11
##  3 wink                          😉      Smileys & Emotion     9
##  4 clap                          👏      People & Body         5
##  5 flag_Switzerland              🇨🇭      Flags                 5
##  6 rocket                        🚀      Travel & Places       5
##  7 +1                            👍      People & Body         4
##  8 arrow_right                   ➡️       Symbols               4
##  9 bug                           🐛      Animals & Nature      3
## 10 computer                      💻      Objects               3
## 11 flexed_biceps                 💪      People & Body         3
## 12 man                           👨      People & Body         3
## 13 bangbang                      ‼️       Symbols               2
## 14 dark_skin_tone                🏿      Component             2
## 15 exclamation                   ❗      Symbols               2
## 16 female_sign                   ♀️       Symbols               2
## 17 four_leaf_clover              🍀      Animals & Nature      2
## 18 green_salad                   🥗      Food & Drink          2
## 19 grinning                      😀      Smileys & Emotion     2
## 20 medium_light_skin_tone        🏼      Component             2
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category        n
##    <chr>                         <chr>   <chr>             <int>
##  1 backhand_index_pointing_right 👉      People & Body        49
##  2 battery                       🔋      Objects              16
##  3 backhand_index_pointing_down  👇      People & Body        12
##  4 woman                         👩      People & Body        12
##  5 palm_tree                     🌴      Animals & Nature     11
##  6 bulb                          💡      Objects              10
##  7 computer                      💻      Objects              10
##  8 evergreen_tree                🌲      Animals & Nature     10
##  9 graduation_cap                🎓      Objects              10
## 10 party_popper                  🎉      Activities           10
## 11 robot                         🤖      Smileys & Emotion    10
## 12 clap                          👏      People & Body         9
## 13 coconut                       🥥      Food & Drink          9
## 14 date                          📅      Objects               9
## 15 deciduous_tree                🌳      Animals & Nature      9
## 16 flag_Switzerland              🇨🇭      Flags                 9
## 17 rocket                        🚀      Travel & Places       9
## 18 automobile                    🚗      Travel & Places       8
## 19 clinking_glasses              🥂      Food & Drink          8
## 20 seedling                      🌱      Animals & Nature      8
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category      n
##    <chr>                         <chr>   <chr>           <int>
##  1 arrow_right                   ➡️       Symbols           320
##  2 arrow_heading_down            ⤵️       Symbols           245
##  3 book                          📖      Objects           115
##  4 mag_right                     🔎      Objects            97
##  5 mega                          📣      Objects            53
##  6 clapper                       🎬      Objects            38
##  7 NEW_button                    🆕      Symbols            35
##  8 computer                      💻      Objects            35
##  9 microscope                    🔬      Objects            32
## 10 bulb                          💡      Objects            29
## 11 police_car_light              🚨      Travel & Places    27
## 12 backhand_index_pointing_right 👉      People & Body      26
## 13 graduation_cap                🎓      Objects            23
## 14 studio_microphone             🎙️       Objects            23
## 15 clap                          👏      People & Body      21
## 16 date                          📅      Objects            17
## 17 medal_sports                  🏅      Activities         15
## 18 memo                          📝      Objects            15
## 19 woman                         👩      People & Body      15
## 20 flag_Switzerland              🇨🇭      Flags              14
## # A tibble: 20 × 4
##    emoji_name                   unicode emoji_category        n
##    <chr>                        <chr>   <chr>             <int>
##  1 sparkles                     ✨      Activities           28
##  2 flag_Switzerland             🇨🇭      Flags                18
##  3 rocket                       🚀      Travel & Places      12
##  4 party_popper                 🎉      Activities           11
##  5 partying_face                🥳      Smileys & Emotion     9
##  6 Christmas_tree               🎄      Activities            7
##  7 clap                         👏      People & Body         7
##  8 star                         ⭐      Travel & Places       7
##  9 bottle_with_popping_cork     🍾      Food & Drink          6
## 10 bulb                         💡      Objects               5
## 11 glowing_star                 🌟      Travel & Places       5
## 12 smiling_face_with_sunglasses 😎      Smileys & Emotion     5
## 13 +1                           👍      People & Body         4
## 14 camera_flash                 📸      Objects               4
## 15 clinking_glasses             🥂      Food & Drink          4
## 16 four_leaf_clover             🍀      Animals & Nature      4
## 17 musical_notes                🎶      Objects               4
## 18 person_running               🏃      People & Body         4
## 19 raised_hands                 🙌      People & Body         4
## 20 robot                        🤖      Smileys & Emotion     4
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category        n
##    <chr>                         <chr>   <chr>             <int>
##  1 graduation_cap                🎓      Objects               3
##  2 man                           👨      People & Body         2
##  3 man_student                   👨‍🎓      People & Body         2
##  4 rocket                        🚀      Travel & Places       2
##  5 snowflake                     ❄️       Travel & Places       2
##  6 backhand_index_pointing_right 👉      People & Body         1
##  7 brain                         🧠      People & Body         1
##  8 chocolate_bar                 🍫      Food & Drink          1
##  9 clapper                       🎬      Objects               1
## 10 eyes                          👀      People & Body         1
## 11 fire                          🔥      Travel & Places       1
## 12 flexed_biceps                 💪      People & Body         1
## 13 grinning                      😀      Smileys & Emotion     1
## 14 heart_eyes_cat                😻      Smileys & Emotion     1
## 15 high_voltage                  ⚡      Travel & Places       1
## 16 mantelpiece_clock             🕰️       Travel & Places       1
## 17 sleeping                      😴      Smileys & Emotion     1
## 18 slightly_smiling_face         🙂      Smileys & Emotion     1
## 19 sun                           ☀️       Travel & Places       1
## 20 woman                         👩      People & Body         1
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category        n
##    <chr>                         <chr>   <chr>             <int>
##  1 arrow_right                   ➡️       Symbols              83
##  2 backhand_index_pointing_right 👉      People & Body        21
##  3 graduation_cap                🎓      Objects              19
##  4 arrow_forward                 ▶️       Symbols              18
##  5 bulb                          💡      Objects              10
##  6 rocket                        🚀      Travel & Places       9
##  7 party_popper                  🎉      Activities            8
##  8 flag_Switzerland              🇨🇭      Flags                 7
##  9 clap                          👏      People & Body         6
## 10 exclamation                   ❗      Symbols               5
## 11 SOON_arrow                    🔜      Symbols               4
## 12 grinning_face_with_big_eyes   😃      Smileys & Emotion     4
## 13 camera_flash                  📸      Objects               3
## 14 computer                      💻      Objects               3
## 15 movie_camera                  🎥      Objects               3
## 16 rainbow                       🌈      Travel & Places       3
## 17 studio_microphone             🎙️       Objects               3
## 18 woman                         👩      People & Body         3
## 19 Christmas_tree                🎄      Activities            2
## 20 backhand_index_pointing_down  👇      People & Body         2
# Generate general tokens for bigram and trigram analysis
tokens <- tweets %>%
  corpus(text_field = "full_text_emojis") %>%
  tokens(
    remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
    remove_url = TRUE, remove_separators = TRUE
  ) %>%
  tokens_tolower() %>%
  tokens_wordstem() %>%
  tokens_select(
    pattern =
      c(
        stopwords("en"), stopwords("de"),
        stopwords("fr"), stopwords("it"), extended_stopwords
      ), selection = "remove"
  )
# Bigram Wordcloud
bi_gram_tokens <- tokens_ngrams(tokens, n = 2)
dfm_bi_gram <- dfm(bi_gram_tokens)
freqs_bi_gram <- sort(colSums(dfm_bi_gram), decreasing = TRUE)
head(freqs_bi_gram, 20)
##          right_arrow             htw_chur          index_point 
##                  421                  259                  207 
##       backhand_index     hochschul_luzern          point_right 
##                  206                  185                  183 
## berner_fachhochschul        sozial_arbeit              prof_dr 
##                  157                  154                  142 
##            haut_cole      herzlich_gratul            open_book 
##                  141                  139                  117 
##        magnifi_glass           glass_tilt           tilt_right 
##                   97                   97                   97 
##         fh_graubnden  neusten_blogbeitrag   book_#revuehmisphr 
##                   91                   87                   85 
##         social_media         advanc_studi 
##                   84                   83
# Create the bigram word cloud
set.seed(123)
wordcloud2(data.frame(
  word = names(freqs_bi_gram),
  freq = freqs_bi_gram
), size = 0.5)
# Trigram Wordcloud
tri_gram_tokens <- tokens_ngrams(tokens, n = 3)
dfm_tri_gram <- dfm(tri_gram_tokens)
reqs_tri_gram <- sort(colSums(dfm_tri_gram), decreasing = TRUE)
head(reqs_tri_gram, 20)
##         backhand_index_point            index_point_right 
##                          206                          183 
##           magnifi_glass_tilt             glass_tilt_right 
##                           97                           97 
##      open_book_#revuehmisphr   hochschul_gestaltung_kunst 
##                           85                           62 
## dipartimento_tecnologi_innov          master_advanc_studi 
##                           40                           38 
##         depart_sozial_arbeit       #infoanlass_mrz_findet 
##                           36                           33 
##              polic_car_light         univers_appli_scienc 
##                           32                           31 
##         busi_administr_statt     findet_#zrich_infoanlass 
##                           30                           30 
##               tag_offenen_tr        hochschul_life_scienc 
##                           29                           29 
##        gestaltung_kunst_fhnw           mas_busi_administr 
##                           29                           28 
##       mehr_neuen_blogbeitrag     mehr_neusten_blogbeitrag 
##                           28                           28
# Create the bigram word cloud
set.seed(123)
wordcloud2(data.frame(
  word = names(reqs_tri_gram),
  freq = reqs_tri_gram
), size = 0.5)

LDA Topic Modeling

# Source: Christoph Zangger -> löscht alle Reihen mit nur 0s
new_dfm <- dfm_subset(dfm_list$en, ntoken(dfm_list$en) > 0)
tweet_lda <- LDA(new_dfm, k = 5, control = list(seed = 123))
# Tidy the LDA results
topic_terms <- tidy(tweet_lda, matrix = "beta")
# Extract topics and top terms
topics <- as.data.frame(terms(tweet_lda, 50)) # First fifty words per topic

# Extract top terms per topic
top_terms <- topic_terms %>%
  group_by(topic) %>%
  top_n(8, beta) %>% # Show top 8 terms per topic
  ungroup() %>%
  arrange(topic, -beta)

# Visualize top terms per topic
top_terms %>%
  mutate(term = reorder_within(term, beta, topic)) %>%
  ggplot(aes(beta, term, fill = factor(topic))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~topic, scales = "free") +
  scale_y_reordered() +
  labs(
    x = "Beta (Term Importance within Topic)",
    y = NULL,
    title = "Top Terms per Topic in Tweets (LDA)"
  )

# Most different words among topics (using log ratios)
diff <- topic_terms %>%
  mutate(topic = paste0("topic", topic)) %>%
  spread(topic, beta) %>%
  filter(topic1 > .001 | topic2 > .001 | topic3 > .001) %>%
  mutate(
    logratio_t1t2 = log2(topic2 / topic1),
    logratio_t1t3 = log2(topic3 / topic1),
    logratio_t2t3 = log2(topic3 / topic2)
  )
diff
## # A tibble: 328 × 9
##    term       topic1  topic2  topic3  topic4  topic5 logratio_t1t2 logratio_t1t3
##    <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>         <dbl>         <dbl>
##  1 @academi… 1.96e-3 5.43e-4 1.53e-3 3.59e-3 3.22e-3        -1.86         -0.358
##  2 @bfh_hesb 1.60e-3 3.51e-3 2.70e-3 4.10e-3 3.54e-3         1.13          0.757
##  3 @ch_univ… 3.20e-4 1.28e-3 6.28e-4 1.06e-4 1.67e-4         2.01          0.975
##  4 @fh_grau… 1.47e-3 4.17e-4 1.24e-4 8.18e-4 1.76e-3        -1.82         -3.58 
##  5 @fhnw     2.59e-3 7.56e-3 7.94e-4 1.15e-3 7.12e-3         1.55         -1.70 
##  6 @fhnwbusi 5.02e-3 3.18e-3 2.27e-3 5.57e-3 6.68e-4        -0.659        -1.15 
##  7 @globalc… 1.89e-3 4.31e-4 2.40e-4 2.90e-4 7.17e-5        -2.13         -2.97 
##  8 @greater… 1.96e-3 1.90e-4 2.21e-3 8.21e-4 2.32e-4        -3.37          0.172
##  9 @grstift… 9.60e-4 2.17e-3 1.41e-3 1.92e-3 2.31e-3         1.17          0.559
## 10 @hes_so   4.52e-4 1.27e-3 3.09e-3 9.90e-4 2.13e-3         1.49          2.77 
## # ℹ 318 more rows
## # ℹ 1 more variable: logratio_t2t3 <dbl>
# LDA Topic Modeling for each university
universities <- unique(tweets$university)

for (uni in universities) {
  # Filter tweets for the current university
  uni_tweets <- tweets %>% filter(university == uni)

  tokens_uni <- uni_tweets %>%
    corpus(text_field = "full_text_emojis") %>%
    tokens(
      remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
      remove_url = TRUE, remove_separators = TRUE
    ) %>%
    tokens_tolower() %>%
    tokens_wordstem() %>%
    tokens_ngrams(n = 1) %>%
    tokens_select(
      pattern =
        c(
          stopwords("en"), stopwords("de"),
          stopwords("fr"), stopwords("it"), extended_stopwords
        ), selection = "remove"
    )
  uni_dfm <- dfm(tokens_uni)
  # Apply LDA
  uni_dfm <- dfm_subset(uni_dfm, ntoken(uni_dfm) > 0)
  tweet_lda <- LDA(uni_dfm, k = 5, control = list(seed = 123))
  # Tidy the LDA results
  tweet_lda_td <- tidy(tweet_lda)
  # Extract top terms per topic
  top_terms <- tweet_lda_td %>%
    group_by(topic) %>%
    top_n(8, beta) %>%
    ungroup() %>%
    arrange(topic, -beta)
  # Visualize top terms per topic
  p <- top_terms %>%
    mutate(term = reorder_within(term, beta, topic)) %>%
    ggplot(aes(beta, term, fill = factor(topic))) +
    geom_col(show.legend = FALSE) +
    facet_wrap(~topic, scales = "free") +
    scale_y_reordered() +
    labs(
      x = "Beta (Term Importance within Topic)",
      y = NULL,
      title = paste("Top Terms per Topic in Tweets from", uni, "(LDA)")
    )
  print(p)
  # Topic Model Summary: top 10 terms per topic
  cat("\nTopic Model Summary for", uni, ":\n")
  print(as.data.frame(terms(tweet_lda, 10)))
}

## 
## Topic Model Summary for FHNW :
##           Topic 1        Topic 2    Topic 3         Topic 4      Topic 5
## 1        @hsafhnw      @fhnwbusi       fhnw            fhnw    @fhnwbusi
## 2            fhnw       @hsafhnw      @fhnw       @fhnwbusi @fhnwtechnik
## 3           swiss           mehr     campus       hochschul        @fhnw
## 4        challeng           fhnw       mehr    @fhnwtechnik     @hsafhnw
## 5          morgen      hochschul       heut @fhnwpsychologi         mehr
## 6            neue   @fhnwtechnik studierend         projekt      schweiz
## 7  brugg-windisch     studierend  hochschul            neue        neuen
## 8            mehr           prof         ab           index        kunst
## 9           olten             dr   @hsafhnw           basel       campus
## 10         erklrt brugg-windisch       neue           @fhnw         heut

## 
## Topic Model Summary for FH_Graubuenden :
##        Topic 1      Topic 2    Topic 3  Topic 4         Topic 5
## 1         chur  blogbeitrag        htw    statt           statt
## 2  #infoanlass         mehr   #htwchur #htwchur            chur
## 3     #htwchur        neuen       chur   findet          findet
## 4          htw   infoanlass       busi    onlin        #htwchur
## 5  blogbeitrag       findet    neusten      htw           #fhgr
## 6    graubnden        #fhgr   #studium    manag            mehr
## 7        #fhgr    graubnden  graubnden     heut       graubnden
## 8   #graubnden         chur      manag     mehr @suedostschweiz
## 9        thema @htwchurtour infoanlass  product             htw
## 10       #chur           fh    studium    #chur     blogbeitrag

## 
## Topic Model Summary for ZHAW :
##             Topic 1          Topic 2          Topic 3         Topic 4
## 1              zhaw            @zhaw            #zhaw @iam_winterthur
## 2              dank @engineeringzhaw       winterthur         schweiz
## 3  @engineeringzhaw             neue             heut           knnen
## 4             @zhaw        @sml_zhaw             neue            mehr
## 5        winterthur       studierend @engineeringzhaw           #zhaw
## 6             thema        schweizer               cc            neue
## 7         schweizer      #zhawimpact             gibt              cc
## 8             zeigt      @c_caviglia      #zhawimpact           neuen
## 9              heut              via              via           studi
## 10              via               cc  @iam_winterthur       schweizer
##             Topic 5
## 1              zhaw
## 2  @engineeringzhaw
## 3                cc
## 4              mehr
## 5             zeigt
## 6              heut
## 7         @sml_zhaw
## 8             knnen
## 9             studi
## 10             gibt

## 
## Topic Model Summary for bfh :
##          Topic 1       Topic 2    Topic 3   Topic 4         Topic 5
## 1            bfh           bfh       bern       bfh            mehr
## 2           biel         thema       neue      neue             bfh
## 3           bern        berner     berner @bfh_hesb #knoten_maschen
## 4           mehr     @bfh_hesb       mehr      bern            bern
## 5         arbeit fachhochschul      innen     thema          berner
## 6        projekt      @hkb_bfh      neuen   zukunft       schweizer
## 7       @hkb_bfh            ab      zeigt   projekt       @bfh_hesb
## 8  fachhochschul      erfahren      thema   digital           knnen
## 9          zeigt          biel      studi      geht        anmelden
## 10         index         statt nachhaltig       neu            neue

## 
## Topic Model Summary for hes_so :
##          Topic 1  Topic 2  Topic 3     Topic 4     Topic 5
## 1          arrow   hes-so   hes-so     @hes_so       right
## 2            dan    right     haut      projet      projet
## 3          right    arrow     cole      master       arrow
## 4         projet  @hes_so  tudiant     tudiant @hessovalai
## 5           book      dan      dan        open        haut
## 6          suiss     tilt    arrow @hessovalai  professeur
## 7  #revuehmisphr  #hes_so  #hes_so       right      domain
## 8        nouvell recherch     plus      hes-so      master
## 9    @hessovalai  travail programm       diplm        open
## 10      dcouvrez    glass recherch    recherch     nouveau

## 
## Topic Model Summary for hslu :
##            Topic 1         Topic 2            Topic 3         Topic 4
## 1            @hslu       hochschul    #hsluinformatik          luzern
## 2             mehr          luzern              studi            mehr
## 3           luzern           @hslu          interview           @hslu
## 4             neue            mehr              zeigt            neue
## 5  #hsluinformatik         schweiz             depart          finden
## 6           depart            heut           menschen      #hslumusik
## 7           design #hsluinformatik    #hsluwirtschaft            heut
## 8         bachelor         projekt             design         studium
## 9             jahr            gibt #hslusozialearbeit #hsluwirtschaft
## 10       schweizer       schweizer              neuen       digitalen
##       Topic 5
## 1       @hslu
## 2       zeigt
## 3       welch
## 4      luzern
## 5        heut
## 6       knnen
## 7      depart
## 8   schweizer
## 9  studierend
## 10      digit

## 
## Topic Model Summary for ost_fh :
##                       Topic 1                    Topic 2
## 1  #ostschweizerfachhochschul                   @ozg_ost
## 2                     @ost_fh #ostschweizerfachhochschul
## 3                 #informatik                    @ost_fh
## 4                         ost                 ostschweiz
## 5                   st.gallen                        ost
## 6                         neu                       neue
## 7                  @eastdigit              fachhochschul
## 8      #wirtschaftsinformatik                       drei
## 9                    @itrockt                 #countdown
## 10                    podcast                      thema
##                       Topic 3                    Topic 4
## 1                         ost                        ost
## 2  #ostschweizerfachhochschul #ostschweizerfachhochschul
## 3                     @ost_fh                       mehr
## 4                  rapperswil                      leben
## 5                    bachelor                 rapperswil
## 6                      campus               kulturzyklus
## 7                         fhs                    podcast
## 8                   st.gallen                      onlin
## 9                      alumni                   kontrast
## 10                      statt                    @ost_fh
##                       Topic 5
## 1                     @ost_fh
## 2  #ostschweizerfachhochschul
## 3                        mehr
## 4                     @ost_wi
## 5                         ost
## 6                     projekt
## 7                   schweizer
## 8                       neuen
## 9                       thema
## 10                   @ozg_ost

## 
## Topic Model Summary for supsi_ch :
##      Topic 1      Topic 2      Topic 3      Topic 4      Topic 5
## 1      arrow        supsi        supsi     #supsiev         info
## 2      right    #supsinew    formazion        supsi           pi
## 3      supsi     studenti     progetto     bachelor     #supsiev
## 4   #supsiev         oggi        studi    formazion        manag
## 5   progetto       master    #supsinew       master   iscrizioni
## 6  #supsinew          deg        nuovo    tecnologi    formazion
## 7   studenti   iscrizioni @usi_univers        corsi        nuovo
## 8  @supsi_ch informazioni    @supsi_ch       scopri dipartimento
## 9     stream         busi       campus @usi_univers      novembr
## 10  svizzera   ingegneria           pi    @supsi_ch         tema

Style Analysis

The distribution of tweet lengths shows variation across universities. Most tweets are concise, aligning with Twitter’s character limit, but the exact length distribution differs among institutions. It is interesting to see that much tweets have around 150 words and that the tweets from the universities are not that long. It is a typical sign that the tweets are not that long and this is a common thing in social media.

tweets %>%
  mutate(tweet_length = nchar(full_text)) %>%
  ggplot(aes(x = tweet_length)) +
  geom_histogram() +
  labs(title = "Distribution of Tweet Lengths")

### Sentiment Analysis Sentiment analysis was conducted to evaluate the emotional tone of the tweets. The analysis used the Syuzhet method to calculate sentiment scores for each tweet.

Overall Sentiment Trends: - The sentiment scores vary over time and by university, showing fluctuations in the emotional tone of the tweets. - Positive words commonly found in tweets include terms related to academic achievements, collaborations, and positive experiences. - Negative words often relate to challenges, competitions, and issues faced by the universities.

Sentiment by University: - FHNW: Positive words include “academy”, “accelerate”, and “activities”. Negative words include “avoid”, “bacteria”, and “challenge”. - FH Graubünden: Positive words include “able”, “academic”, and “advantage”. Negative words include “competition”, “corruption”, and “fire”. - ZHAW: Positive words include “abilities”, “academic”, and “achievement”. Negative words include “barrier”, “challenge”, and “competition”. - BFH: Positive words include “academic”, “access”, and “activities”. Negative words include “aggression”, “competition”, and “fail”. - HES-SO: Positive words include “academic”, “active”, and “amazing”. Negative words include “confessions”, “failure”, and “hard”. - HSLU: Positive words include “academic”, “access”, and “achievement”. Negative words include “addiction”, “challenge”, and “fail”. - OST-FH: Positive words include “announce”, “beautiful”, and “collaboration”. Negative words are minimal, including “dire” and “fire”. - SUPSI-CH: Positive words include “academic”, “access”, and “achievement”. Negative words include “barrier”, “cloud”, and “danger”.

# Calculate Sentiment for Supported Languages Only
langs <- c("de", "fr", "it", "en")

tweets_filtered <- tweets %>%
  filter(lang %in% langs)

# TODO: because sentiment only works for english
# Create Function to Get Syuzhet Sentiment
get_syuzhet_sentiment <- function(text, lang) {
  # Check if language is supported
  if (lang %in% langs) {
    return(get_sentiment(text, method = "syuzhet", lang = lang))
  } else {
    return(NA) # Return NA for unsupported languages
  }
}

# Calculate Syuzhet Sentiment for each Tweet
tweets_filtered$sentiment <-
  mapply(get_syuzhet_sentiment, tweets_filtered$full_text, tweets_filtered$lang)

plot_data <- tweets_filtered %>%
  group_by(university, month) %>%
  summarize(mean_sentiment_syuzhet = mean(sentiment, na.rm = TRUE))

# Plot Syuzhet Sentiment by all Universities
ggplot(plot_data, aes(
  x = month,
  y = mean_sentiment_syuzhet,
  color = university, group = university
)) +
  geom_line() +
  labs(
    title = "Mean Syuzhet Sentiment Over Time by University",
    y = "Mean Sentiment Score"
  ) +
  scale_x_datetime(date_breaks = "1 month", date_labels = "%Y-%m") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

for (uni in unique(tweets$university)) {
  uni_tweets <- tweets %>%
    filter(university == uni, lang == "en")

  uni_tweets$sentiment <-
    mapply(get_syuzhet_sentiment, uni_tweets$full_text, uni_tweets$lang)

  plot_data <- uni_tweets %>%
    group_by(month) %>%
    summarize(mean_sentiment = mean(sentiment, na.rm = TRUE))

  # Plot Syuzhet Sentiment Over Time (Per University)
  print(ggplot(plot_data, aes(x = month, y = mean_sentiment, group = 1)) +
    geom_line() +
    geom_smooth(method = "lm", se = FALSE, color = "red") +
    labs(
      title = paste0("Mean Syuzhet Sentiment Over Time by - ", uni),
      y = "Mean Sentiment Score",
      x = "Month"
    ))

  # Did not found a way to get the sentiment from the tweets for each language so I will use the full_text_emojis column and detect the language of the words only in german
  # Tokenize and Preprocess Words
  uni_words_en <- uni_tweets %>%
    unnest_tokens(word, full_text_emojis) %>%
    anti_join(get_stopwords(language = "en"), by = "word") %>%
    distinct() %>%
    filter(nchar(word) > 3) %>%
    filter(!str_detect(word, "\\d")) %>%
    filter(!str_detect(word, "https?://\\S+|www\\.\\S+|t\\.co|http|https"))


  sentiment_words_en <- uni_words_en %>%
    mutate(
      sentiment = get_sentiment(word, method = "syuzhet")
    )

  # Separate Positive and Negative Words
  positive_words_en <- sentiment_words_en %>%
    filter(sentiment >= 0) %>%
    count(word, sort = TRUE) %>%
    rename(freq = n)

  negative_words_en <- sentiment_words_en %>%
    filter(sentiment < 0) %>%
    count(word, sort = TRUE) %>%
    rename(freq = n)

  # Create and Display Word Clouds
  # positive words
  print(paste0("Positive words for: ", uni))
  print(head(positive_words_en, 20))
  print(wordcloud2(data.frame(
    word = positive_words_en$word,
    freq = positive_words_en$freq
  ), size = 0.5))

  print(paste0("Negative words for: ", uni))
  print(head(negative_words_en, 20))
  # negative words
  print(wordcloud2(data.frame(
    word = negative_words_en$word,
    freq = negative_words_en$freq
  ), size = 0.5))
}

## [1] "Positive words for: FHNW"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word          freq
##    <chr>      <chr>        <int>
##  1 FHNW       aacsb            1
##  2 FHNW       academy          1
##  3 FHNW       acbtxijqdl       1
##  4 FHNW       accelerate       1
##  5 FHNW       accross          1
##  6 FHNW       activities       1
##  7 FHNW       adorno           1
##  8 FHNW       agreement        1
##  9 FHNW       aline            1
## 10 FHNW       although         1
## 11 FHNW       ambassador       1
## 12 FHNW       america          1
## 13 FHNW       american         1
## 14 FHNW       among            1
## 15 FHNW       amxxbmlfyc       1
## 16 FHNW       anlass           1
## 17 FHNW       anmelden         1
## 18 FHNW       announce         1
## 19 FHNW       announcement     1
## 20 FHNW       announces        1
## [1] "Negative words for: FHNW"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word           freq
##    <chr>      <chr>         <int>
##  1 FHNW       avoid             1
##  2 FHNW       bacteria          1
##  3 FHNW       blatant           1
##  4 FHNW       blind             1
##  5 FHNW       boom              1
##  6 FHNW       breaking          1
##  7 FHNW       challenge         1
##  8 FHNW       cloud             1
##  9 FHNW       competition       1
## 10 FHNW       concerned         1
## 11 FHNW       devastating       1
## 12 FHNW       disadvantaged     1
## 13 FHNW       exhausted         1
## 14 FHNW       forget            1
## 15 FHNW       hype              1
## 16 FHNW       late              1
## 17 FHNW       launch            1
## 18 FHNW       limited           1
## 19 FHNW       mistakes          1
## 20 FHNW       outreach          1

## [1] "Positive words for: FH_Graubuenden"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university     word            freq
##    <chr>          <chr>          <int>
##  1 FH_Graubuenden able               1
##  2 FH_Graubuenden abroad             1
##  3 FH_Graubuenden abstract           1
##  4 FH_Graubuenden abstracts          1
##  5 FH_Graubuenden academic           1
##  6 FH_Graubuenden accu_rate          1
##  7 FH_Graubuenden across             1
##  8 FH_Graubuenden action             1
##  9 FH_Graubuenden activities         1
## 10 FH_Graubuenden address            1
## 11 FH_Graubuenden administration     1
## 12 FH_Graubuenden advantage          1
## 13 FH_Graubuenden advantages         1
## 14 FH_Graubuenden adventurous        1
## 15 FH_Graubuenden alliance           1
## 16 FH_Graubuenden almost             1
## 17 FH_Graubuenden alps               1
## 18 FH_Graubuenden already            1
## 19 FH_Graubuenden also               1
## 20 FH_Graubuenden alumnus            1
## [1] "Negative words for: FH_Graubuenden"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university     word         freq
##    <chr>          <chr>       <int>
##  1 FH_Graubuenden collective      1
##  2 FH_Graubuenden competition     1
##  3 FH_Graubuenden corruption      1
##  4 FH_Graubuenden countdown       1
##  5 FH_Graubuenden fall            1
##  6 FH_Graubuenden fallen          1
##  7 FH_Graubuenden fighting        1
##  8 FH_Graubuenden fire            1
##  9 FH_Graubuenden kick            1
## 10 FH_Graubuenden leave           1
## 11 FH_Graubuenden neglect         1
## 12 FH_Graubuenden neglecting      1
## 13 FH_Graubuenden problem         1
## 14 FH_Graubuenden quiz            1
## 15 FH_Graubuenden rainy           1
## 16 FH_Graubuenden risks           1
## 17 FH_Graubuenden spent           1
## 18 FH_Graubuenden strange         1
## 19 FH_Graubuenden stupidest       1
## 20 FH_Graubuenden sues            1

## [1] "Positive words for: ZHAW"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word          freq
##    <chr>      <chr>        <int>
##  1 ZHAW       _bengraziano     1
##  2 ZHAW       abilities        1
##  3 ZHAW       able             1
##  4 ZHAW       abroad           1
##  5 ZHAW       abstracts        1
##  6 ZHAW       academ           1
##  7 ZHAW       academic         1
##  8 ZHAW       acertainpain     1
##  9 ZHAW       achievement      1
## 10 ZHAW       across           1
## 11 ZHAW       actually         1
## 12 ZHAW       addition         1
## 13 ZHAW       additional       1
## 14 ZHAW       adespydvzf       1
## 15 ZHAW       admits           1
## 16 ZHAW       adopted          1
## 17 ZHAW       advise           1
## 18 ZHAW       advisory         1
## 19 ZHAW       afterwards       1
## 20 ZHAW       agarwaledu       1
## [1] "Negative words for: ZHAW"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word         freq
##    <chr>      <chr>       <int>
##  1 ZHAW       barrier         1
##  2 ZHAW       bastion         1
##  3 ZHAW       break           1
##  4 ZHAW       challenge       1
##  5 ZHAW       cold            1
##  6 ZHAW       competition     1
##  7 ZHAW       conflict        1
##  8 ZHAW       countdown       1
##  9 ZHAW       desert          1
## 10 ZHAW       economic        1
## 11 ZHAW       enough          1
## 12 ZHAW       entitled        1
## 13 ZHAW       fled            1
## 14 ZHAW       foreign         1
## 15 ZHAW       hack            1
## 16 ZHAW       hazard          1
## 17 ZHAW       hidden          1
## 18 ZHAW       ironic          1
## 19 ZHAW       missing         1
## 20 ZHAW       moan            1

## [1] "Positive words for: bfh"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word            freq
##    <chr>      <chr>          <int>
##  1 bfh        _mich_i            1
##  2 bfh        abend              1
##  3 bfh        able               1
##  4 bfh        academic           1
##  5 bfh        accepted           1
##  6 bfh        access             1
##  7 bfh        across             1
##  8 bfh        activities         1
##  9 bfh        addressing         1
## 10 bfh        administration     1
## 11 bfh        agriculture        1
## 12 bfh        alphasolarpro      1
## 13 bfh        alternative        1
## 14 bfh        always             1
## 15 bfh        amarenabrown       1
## 16 bfh        america's          1
## 17 bfh        analysis           1
## 18 bfh        andreasnaef        1
## 19 bfh        annetwh            1
## 20 bfh        announce           1
## [1] "Negative words for: bfh"
## # A tibble: 12 × 3
## # Groups:   university [1]
##    university word            freq
##    <chr>      <chr>          <int>
##  1 bfh        aggression         1
##  2 bfh        broken             1
##  3 bfh        competition        1
##  4 bfh        discrimination     1
##  5 bfh        fail               1
##  6 bfh        forget             1
##  7 bfh        inequality         1
##  8 bfh        labor              1
##  9 bfh        player             1
## 10 bfh        sorry              1
## 11 bfh        stole              1
## 12 bfh        stop               1

## [1] "Positive words for: hes_so"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word            freq
##    <chr>      <chr>          <int>
##  1 hes_so     academic           1
##  2 hes_so     active             1
##  3 hes_so     actors             1
##  4 hes_so     actualitesvd       1
##  5 hes_so     administration     1
##  6 hes_so     administrative     1
##  7 hes_so     admission          1
##  8 hes_so     advanced           1
##  9 hes_so     agencies           1
## 10 hes_so     agenda             1
## 11 hes_so     also               1
## 12 hes_so     amazing            1
## 13 hes_so     amman              1
## 14 hes_so     analyses           1
## 15 hes_so     anne_ramelet       1
## 16 hes_so     announce           1
## 17 hes_so     anounce            1
## 18 hes_so     antonio            1
## 19 hes_so     applications       1
## 20 hes_so     apply              1
## [1] "Negative words for: hes_so"
## # A tibble: 9 × 3
## # Groups:   university [1]
##   university word         freq
##   <chr>      <chr>       <int>
## 1 hes_so     confessions     1
## 2 hes_so     converted       1
## 3 hes_so     fade            1
## 4 hes_so     failure         1
## 5 hes_so     hard            1
## 6 hes_so     intense         1
## 7 hes_so     launch          1
## 8 hes_so     poor            1
## 9 hes_so     vice            1

## [1] "Positive words for: hslu"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word           freq
##    <chr>      <chr>         <int>
##  1 hslu       aacsb             1
##  2 hslu       able              1
##  3 hslu       abstract          1
##  4 hslu       academia          1
##  5 hslu       academic          1
##  6 hslu       accept            1
##  7 hslu       acceptance        1
##  8 hslu       access            1
##  9 hslu       according         1
## 10 hslu       account           1
## 11 hslu       accreditation     1
## 12 hslu       achieving         1
## 13 hslu       action            1
## 14 hslu       additions         1
## 15 hslu       address           1
## 16 hslu       advantage         1
## 17 hslu       afternoon         1
## 18 hslu       agflow            1
## 19 hslu       ahead             1
## 20 hslu       aims              1
## [1] "Negative words for: hslu"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word         freq
##    <chr>      <chr>       <int>
##  1 hslu       addiction       1
##  2 hslu       awaited         1
##  3 hslu       bacteria        1
##  4 hslu       challenge       1
##  5 hslu       competition     1
##  6 hslu       crashes         1
##  7 hslu       cut_up_tv       1
##  8 hslu       dark            1
##  9 hslu       dizzy           1
## 10 hslu       error           1
## 11 hslu       fail            1
## 12 hslu       fall            1
## 13 hslu       fears           1
## 14 hslu       fire            1
## 15 hslu       hack            1
## 16 hslu       laden           1
## 17 hslu       launch          1
## 18 hslu       missed          1
## 19 hslu       mistake         1
## 20 hslu       regulatory      1

## [1] "Positive words for: ost_fh"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word           freq
##    <chr>      <chr>         <int>
##  1 ost_fh     announce          1
##  2 ost_fh     august            1
##  3 ost_fh     backbone          1
##  4 ost_fh     based             1
##  5 ost_fh     beautiful         1
##  6 ost_fh     bridge            1
##  7 ost_fh     business          1
##  8 ost_fh     campus            1
##  9 ost_fh     closer            1
## 10 ost_fh     collaboration     1
## 11 ost_fh     cooling           1
## 12 ost_fh     curious           1
## 13 ost_fh     cyber             1
## 14 ost_fh     deadline          1
## 15 ost_fh     december          1
## 16 ost_fh     delighted         1
## 17 ost_fh     easily            1
## 18 ost_fh     eastern           1
## 19 ost_fh     electricity       1
## 20 ost_fh     emits             1
## [1] "Negative words for: ost_fh"
## # A tibble: 2 × 3
## # Groups:   university [1]
##   university word   freq
##   <chr>      <chr> <int>
## 1 ost_fh     dire      1
## 2 ost_fh     fire      1

## [1] "Positive words for: supsi_ch"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word             freq
##    <chr>      <chr>           <int>
##  1 supsi_ch   _dreicast           1
##  2 supsi_ch   abbg                1
##  3 supsi_ch   abbgroupnews        1
##  4 supsi_ch   abroad              1
##  5 supsi_ch   academ              1
##  6 supsi_ch   academia            1
##  7 supsi_ch   academic            1
##  8 supsi_ch   academies_ch        1
##  9 supsi_ch   access              1
## 10 supsi_ch   accoding            1
## 11 supsi_ch   achieve             1
## 12 supsi_ch   action              1
## 13 supsi_ch   activator           1
## 14 supsi_ch   activities          1
## 15 supsi_ch   address             1
## 16 supsi_ch   administration      1
## 17 supsi_ch   administrations     1
## 18 supsi_ch   admooajcib          1
## 19 supsi_ch   advanced            1
## 20 supsi_ch   advancedstudies     1
## [1] "Negative words for: supsi_ch"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word         freq
##    <chr>      <chr>       <int>
##  1 supsi_ch   barrier         1
##  2 supsi_ch   cloud           1
##  3 supsi_ch   cold            1
##  4 supsi_ch   collision       1
##  5 supsi_ch   critical        1
##  6 supsi_ch   danger          1
##  7 supsi_ch   demand          1
##  8 supsi_ch   demanded        1
##  9 supsi_ch   distracts       1
## 10 supsi_ch   drug            1
## 11 supsi_ch   economic        1
## 12 supsi_ch   eth_rat         1
## 13 supsi_ch   fabrication     1
## 14 supsi_ch   fire            1
## 15 supsi_ch   foreign         1
## 16 supsi_ch   forget          1
## 17 supsi_ch   government      1
## 18 supsi_ch   hard            1
## 19 supsi_ch   intense         1
## 20 supsi_ch   launch          1

Conclusion:

The analysis indicates that Swiss Universities of Applied Sciences exhibit diverse tweeting patterns in terms of content, style, and emotions. Tweets often focus on academic achievements, projects, and institutional news, with varying emotional tones across different universities. Recognizing these patterns can help universities optimize their social media strategies to better engage with their audiences. ## Question 4: What specific advice can you give us as communication department of BFH based on your analysis? How can we integrate the analysis of tweets in our internal processes, can you think of any data products that would be of value for us?

The comprehensive analysis of BFH’s tweets reveals several insights that can be leveraged to enhance the communication strategy.

Language Analysis:

BFH predominantly tweets in German, with 2760 tweets in this language. This aligns with the linguistic preferences of their primary audience.

Emoji Analysis:

The analysis of emoji usage shows that certain emojis are frequently used, which can be leveraged to increase engagement. Popular emojis like 🎓 (graduation cap) and 🚀 (rocket) often signify academic achievements and dynamic growth, resonating well with the audience.

Summary key insights from the analysis

# Language Analysis
tweets %>%
  filter(university == "bfh") %>%
  count(lang) %>%
  arrange(desc(n))
## # A tibble: 17 × 3
## # Groups:   university [1]
##    university lang        n
##    <chr>      <chr>   <int>
##  1 bfh        de       2760
##  2 bfh        <NA>      212
##  3 bfh        en         97
##  4 bfh        lb         62
##  5 bfh        fr         31
##  6 bfh        fy          8
##  7 bfh        no          6
##  8 bfh        nl          3
##  9 bfh        af          2
## 10 bfh        cy          2
## 11 bfh        da          2
## 12 bfh        ht          2
## 13 bfh        it          2
## 14 bfh        ru-Latn     2
## 15 bfh        es          1
## 16 bfh        gd          1
## 17 bfh        mt          1
# Emoji Analysis
emoji_count <- tweets %>%
  top_n_emojis(full_text)

emoji_count %>%
  mutate(emoji_name = reorder(emoji_name, n)) %>%
  ggplot(aes(n, emoji_name)) +
  geom_col() +
  labs(x = "Count", y = NULL, title = "Top 20 Emojis Used")

insights <- list(
  "Most Active Hours" = hours_with_most_tweets_by_uni,
  "Most Active Days" = days_with_most_tweets_by_uni,
  "Content Analysis" = head(words_freqs_de),
  "Sentiment Analysis" = head(tweets_filtered$sentiment)
)

Recommendations:

Based on the analysis, the following recommendations can be made to enhance BFH’s communication strategy: 1. Optimize Tweet Release Times: Based on the analysis of tweet activity, the most active hours for BFH are typically in the morning. Focusing on releasing tweets during these peak hours can maximize engagement. Scheduling important announcements and updates during these times will likely yield better visibility and interaction. 2. Focus on Specific Days for Announcements: The analysis shows that Tuesday is the most active day for BFH tweets. Leveraging this day for critical updates and major announcements can ensure they reach a wider audience. Aligning content release schedules with these high-activity days can enhance communication effectiveness. Sentiment Analysis: Sentiment analysis indicates the emotional tone of the tweets, helping tailor content to resonate positively with the audience. By understanding which types of tweets generate positive reactions, the communication team can craft messages that are more likely to be well-received. This could involve highlighting student achievements, successful projects, and positive institutional news. 4. Implement Topic Modeling: Topic modeling reveals the key themes prevalent in the tweets. For BFH, topics often include academic projects, student updates, and digital initiatives. Aligning the communication strategy to emphasize these themes can enhance relevance and engagement. Regularly updating the communication team on trending topics can help keep the content aligned with audience interests.

Integrating Tweet Analysis into Internal Processes:

To fully leverage these insights, the BFH communication department can integrate tweet analysis into their regular workflow: 1. Real-Time Analytics Dashboard: Implement a dashboard that tracks tweet performance, including engagement metrics, sentiment scores, and topic trends. This allows for real-time adjustments to the communication strategy. 2. Scheduled Reports: Generate weekly or monthly reports summarizing key metrics and insights. This helps the team stay informed about what content is performing well and where improvements can be made. 3. Content Calendar: Develop a content calendar that aligns tweet releases with peak engagement times and days. Incorporate findings from sentiment and topic analyses to plan content that resonates with the audience. 4. Feedback Loop: Establish a feedback loop where the communication team reviews analytics data and adjusts the strategy accordingly. Regular team meetings to discuss these insights can foster a more data-driven approach to communication.

Potential Data Products:

To further enhance the communication strategy, BFH can consider developing data products that provide additional value: 1. Engagement Prediction Tool: A tool that predicts the best times to tweet based on historical data, optimizing tweet scheduling for maximum engagement. 2. Sentiment Analysis Bot: An automated system that analyzes the sentiment of drafts before they are posted, ensuring that the tone is appropriate and likely to generate positive reactions. 3. Trend Tracker: A feature that identifies emerging topics and trends in real-time, allowing the communication team to quickly adapt and incorporate relevant themes into their messaging.

By integrating these recommendations and tools, BFH can enhance its communication strategy, ensuring that its messages are timely, relevant, and engaging for its audience.